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Abstract 

The sample median is often used in statistical analyses of physical or astronomical data wherein 
a central value must be found from samples polluted by elements which do not belong to the 
population of interest or when the underlying probability law is such that the sample mean is 
useless for the stated purpose. 

Although it does not generally possesses the nice linearity properties of the mean, the median has 
advantages of its own, some of which are explored in this paper which elucidates analogies and 
differences between these two central value descriptors. Some elementary results are shown, most 
of which are certainly not new but not widely known either. 

It is observed that the moment and the quantile approaches to the description of a probability 
distribution are difficult to relate, but that when the quantile description is used, the sample 
median can be characterized very much in the same way as the sample mean; this opens the 
possibility of using it for estimation purposes beyond what is usually done. 

In order to relate the two approaches, a derivation is given of the asymptotic joint distribution of 
the mean and the median for a general continuous probability law. 
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I. INTRODUCTION 

As a measure of the central value of a continuous probability distribution, the median 
which halves it into pieces of equal probabilities is certainly as valuable an indicator as 
the expectation value (E.V.). It has the further advantage to exist for all continuous 
distributions, contrary to the E.V. which is not always defined, as examplified by the well 
known lorentzian (alias Breit-Wigner, alias Cauchy) density function. 

In practical problems, the superiority of the sample median over the mean manifests 
itself when a central value must be derived from samples polluted by data which do not 
belong to the population of interest. In thoses cases, it has the advantage of a lower 
sensitivity to 'outliers', that is, to abnormally high or low values which most likely come 
from the contaminating data. 

It is shown in this paper that, given a sample of n independent random variables with 
the same continuous parent distribution, the sample median possesses, with respect to the 
parent distribution median^', properties which are quite similar to those of the sample mean 
w.r.t. the parent E.V. It will be argued that the often invoked difficulties in the calculation 
of the median distribution characteristics arise from ill-posed questions, but that there exist 
simple answers provided one does not try to fit a round peg in a square hole. To take but one 
example, it is in general not possible to analytically calculate the E.V. of the sample median, 
but attempting to perform this calculation is trying to answer the wrong question: it is the 
distribution median (and not the mean) of the sample median which ought to be calculated 
and this can be done easily and in full generality. A by-product of this point of view is that 
the often made comparison between the merits of the sample median and the sample mean 
based on their respective standard deviations, which uses expectation values, might not 
be as well grounded as it is thought to be and should not necessarily lead to prefer the mean. 
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This is not to say that there are no problems in using the median; first and foremost, 
the distribution median is generally not defined stricto sensu for a discrete random variable 
and the sample median for discrete data needs some kind of interpolation to be defined; we 
shall not deal with those problems here and content ourselves with discussing continuous 
distributions. But even for these, we must further restrict ourselves to 1-dimcnsionnal 
random variables; applying the standard definition to a multidimensionnal distribution 
would yield a median point which varies with the chosen axes. 

Another drawback arises from the fact that the median is not, in general, a linear operator 
over the vector space of random variables; the question of deciding when it is linear is a 
difficult one (obviously akin to the problem of the multidimensionnal median just evoked) 
and we shall not deal with it in full generality in this note. 

The third kind of problem has been evoked above; it arises from ill-posed questions and we 
shall see how to dispose of it. 

The plan of this paper is as follows: 
Part II is a very short reminder of basic facts about order statistics which will be needed in 
the sequel. 

Part III is devoted to demonstrate that the median (we are speaking here of the random 
variable as well as of the 50% quantile) can arguably be used to characterize the "center" of 
a distribution as legitimately as the mean (we are speaking here of the random variable as 
well as of the first moment); both random variables are shown to behave similarly w.r.t. to 
the samples from which they are derived. It is further remarked that the median possesses 
some qualities which have no counterpart for the expectation value, (obviously, the converse 
is also true) 

Part IV makes use of these findings to show how the median can be employed for estimation 
purposes. The example of the estimation of a ratio is explored. 

Part V is devoted to a short attempt at defining a dispersion characteristic for the median 
in line with the quantile type of approach to a probability distribution. The variance, being 
an expectation value, is not well fit for this role and we propose other solutions based on 
intervals. 

Part VI is slightly off the main line of argument, but is included for completness and because 
it might be of use to people who need to define the centroid of a distribution by some linear 
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combination of the mean and the median. The full asymptotic joint probability distribution 
of these two statistics is derived; to the best of the author's knowledge, the expression of 
the covariance is new. 

Part VII contains our conclusions. An appendix is added to demonstrate that the problem 
of building an unbiassed ratio estimator addressed in part IV is not solvable in the usual 
sense. 

For pedagogical purposes, all demonstrations are given in full, even when they can be easily 
found in the litterature. 

II. ORDER STATISTICS AND DISTRIBUTION QUANTILES 

From now on and unless otherwise stated, it will be understood that the parent distribution 
of the random variables forming the samples that we shall deal with is continuous and 
that its c.d.f. (cumulative distribution function) is strictly increasing on its domain of 
variation. As a consequence, for any q g]0, 1[ there exists one and only one value niq such 
that F{mq) = q. nig is known as the q-quantile of the distribution. The particular case 
q = 1/2 corresponds to the (distribution-) median mi/2 

Given a sample of n random variables Xi, X2..Xj..X„, we define its k^^ order statistics 
or its k^^ quantile Yk as the random variable which takes the value of the k^^ of the X'^s 
when these are renumbered in ascending order. In other words, Yi is defined as the smallest 
of Xi..Xn, Y2 as the next smallest and so on, for any realisation of the sample. 

In what follows, we shall assume that the X's are i.i.d. (independent, identically 
distributed) with continuous c.d.f. F(x) and, would the need arise, p.d.f. f{x) = ^{x). 
Observe that, together with the hypothesis stated at the beginning of this section, this implies 
that there is zero probability that any two of the X's will be equal. The Y's are therefore 
unambiguously defined. 

Lemma 1 : The distribution of the k^^ order statistics is given by 

dPkix) = — l-—-Fix)''-\l - F{x)r~>^dFix) ^ (1) 
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Proof : Yk lies between x and x + dx if and only if — 1 among the X's take 
their values below x (probability F{x)^~^), one takes its value between x and x + dx 
(probability ~ dF{x)) and the remaining {n — k) take their values above x + dx (probability 
(1 — F{x + dx))"'^^). However, the first k — 1 can be choosen in C^~^ mutually exclusive 
ways to each of which can be associated n — k + 1 exclusive ways of choosing the fc*^. 
Neglecting the dx in F{x + dx) which could only generate terms of second order and higher, 
we obtain the stated result. 

Lemma 2 : The joint distribution of the k^^ and order statistics is given by 

(2) 

ioT X < y and for x > y 

Proof : Yk lies between x and x + dx and Yi between y and y + dy \S. k — 1 among 
the X's take their values below x, (probability F{x)^~^), one takes its value between x 
and x + dx (probability ~ dF{x)) I — k — 1 among the remaining n — k take their values 
between x and y (probability {F{y) — F(x))'^*^^^), one takes its value between y and y + dy 
(prob. ~ dF{y)) and the remaining n — I are all above y (prob. (1 — -F(?/)"~'). Counting 
the number of mutually exclusive choices for the indices entering those various sets and 
simplifying the result yields the given numerical coefficient. 

Clearly, these results can be extented to an arbitrary number of sample quantiles but 
we shall not need more in this paper. Note however the obvious but interesting fact that 
after the change of variable x —>■ F{x) those distributions are totally independent of the 
parent probability law. The F(Yfc)'s follow jSj-type probability distributions and multidi- 
mensional extensions thereof. This is very usefuU when constructing confidence intervals for 
the distribution quantiles. 

III. THE CASE OF THE MEDIAN 

As already remarked, the hypotheses entail that there exists a unique value mi/2 of the 
argument of F for which F{mi/2) = 1/2 . is then known as the parent distribution 



A Properties shared with the mean 



6 



median which we shall presently relate to the sample median defined as the central 
value for an odd-sized sample (r*^-order statistics if n = 2r + 1) or whichever of the two 
central values (r*^ or (-|-l*^-order statistics) comes up in an even odds random draw for an 
even-sized sample (n = 2r). 

Lemma 3 : for an odd-sized sample of size 2r + 1, the probability that the median M 
lies between x and x + dx is {r + l)C2r+iF{xY{l — F{x)YdF{x) up to higher order terms. 

Proof: This is lemma 1 applied to n = 2r -|- 1 and k — r + 1 

The p.d.f. of the sample median is therefore: 

2r -I- 1 ' 

fM{x) = -^F{xni-F{x)yf{x) 

By similar use of lemma 1, it is immediate to show that for an even-sized sample of size 
2r -|- 2, the density of the r -|- 1**^ variable is: 

^^^^^^ ^ WF^f^^^^^'^^ " 

and that of the r -|- 2*^ variable is: 

which, by averaging (following our definition of the median for an even sample) gives us 

back the same distribution that we have found for the 2r -|- 1-sized sample. 

This being established, we won't have to worry about the parity of the sample size in most 

cases. 

A. Properties shared with the mean 

In what follows, M is assumed to be the median of an odd-sized sample of iid random 
variables of p.d.f. f{x), and A4 stands for the "median operator", that is is the 

median of the distribution of the random variable X exactly as E[X] is the expectation 
value of this distribution (remember that we restrict ourselves to distributions having a 
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unique median). will denote the p.d.f. of M. 

Theorem 1: the median of the distribution fnix) is the same as that of f{x), that is 
M[M] =M[X] 

Proof: J"^ fM{x)dx = J^^^'' ^nj^f ''(1 — vYdv through the change of variable v = F{x) 
However, the last integrand is form-invariant under v ^ 1 — v and the integral up to F = 1 
must be equal to 1; therefore: 

Jo {rT Ji/2 [riy 

Since -F(mi/2) = 1/2, it follows that mi/2 is also the median of the distribution /m 

Remark 1: This is the perfect analogue of the equality between the E.V. of the sample 
mean and that of the parent distribution. Clearly, the E.V. of M has nothing to do here. 

Remark 2: This theorem solves the problem of a distribution-free point estimation of 
the median, contrary to what is stated in^, but it must be remembered that the estimator 
is median unbiassed. Searching an expectation value unbiassed estimator is not 
logically well grounded and it is clear that the E.V. of M will generally be dependant on the 
particular F at stake and difficult to calculate, with the obvious exception of symmetrical 
distributions (see below) The question of interval estimation will be taken up later on. 

Theorem 2: The median of a probability distribution is the number with respect to 
which the mean absolute deviation is minimal: 

Ir \^ ~ is minimal for ^ = mi/2 

Write 

HO = / \x-Of{x)dx= / - x)f{x)dx + / {x-^)f{x)dx 
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Then 



dh 



(0=/ f{x)dx- [ f{x)dx and = 2/(0 > 



d^ 



Jx<£, Jx>^ 



therefore the point ^ = mi/2 is the only zero of ^ and corresponds to a minimum of h 

This parallels the property of the E.V. to minimize the mean squared deviation, 
(/(a) = E[{X - af] is minimal for a = E[X] ) 

Theorem 3: The median of an odd-sized set of real numbers xj, {j = 1..2r + 1} is the 



This (continuous) function of ^ can be derived everywhere except at the points Xj: 
/'(O = cardinal{j\xj < — cardinal{j\xj > ^} 

f is discontinuous and piecewise constant, but clearly monotonous, negative for small ^, 
positive for large ^ and zero for ^ = Xr+i 

Note that for an even-sized set of numbers, any value in the central interval is a solution 
and a legitimate median. 

Here also we have a result quite analogous to the one which holds for the mean w.r.t. 
the sum of the squared deviations, (/(a) = ^^-'^^""^ is minimal for a — ^p-) 

CoroUciry: The sample median is the maximum likelihood estimator of the parameter 
a (which equals the E.V. and the median) of "Laplace's first law of errors" the density of 
which reads: 



Maximizing the log-likelihood w.r.t. a is the same as minimizing /(O in theorem 3. 

Once again, this is the analogue of the result for the sample mean of an iid gaussian sample. 



value which minimizes f{C) — \xj — C\ with respect to ^ 



As before, write /(O = E^- l^j - CI = Ex,<e(C " ^j) + ^xj^i^J - 
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Remark 3: The analogy between the sample median and the sample mean is apparently 
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dimmed by the fact that the sample mean dispersion is easily calculated in terms of the 
parent distribution variance, but that the same calculation with Jm is obviously much more 
difficult. The point is that the dispersion also is expressed in terms of expectation values 
and one is again confronted with the problem of asking the right question. One should 
therefore find a measure of dispersion which could be expressed in a way closer to the 
median kind of philosophy. This could be the width of an interval containing the sample 
median with a given probability. The problem will be briefly addressed in part V. 
Another point is that the E.V. is a linear operator on the vector space of random variables 
which admit an E.V. Likewise, the arithmetic mean of sequences of length n is a linear 
operator on the vector space M". Such is generally not the case, neither for the distribution 
median, nor for the median of a numerical sequence.- 

However, the following two theorems which sum up intuitively obvious properties, show 
that for symmetrically distributed random variables the median operator is linear: 

Theorem 4: If / is symmetrical about a then mi/2 = a and /a/ is also symmetrical 
about a. In addition, if / has an expectation value m, then mi/2 = m = E[M] = Ai[M] 

Proof: Assume that /(2a — x) = f{x). 
Then J"^ f{x)dx = f{2a — x)dx = f{y)dy with x ^ 2a — y 
Since f{x)dx = 1, one thus finds f{x)dx = 1/2 and a = mi/2. 

Moreover F{2a — x) = ^ fit)dt = /(2a — u)du = 1 — F{x) using again t ^ 2a — u 
and the symmetry of /. 

Thus /Af(2a -x) = ^F{2a - x)'^(l - F(2a - x))7(2a - x) = /m(x) 

Now m = xf{x)dx = xf{2a — x)dx = J^{2a — y)f{y)dy = 2a — m 

Therefore a = m = mi/2 and since F(l — F) < 1/4 it is obvious that if the parent has an 

expectation value, so does M. Hence a = E[M] = A4[M] 

Remark 4: there is evidently no reason for i?[M] to be equal to E[X] in the general 
case, not even asymptotically (see below). 

Theorem 5: Let X and Y be two independnt random variables the densities of which 
are symmetrical about a and b respectively. Then Z = aX + jSY is symmetrical about 



A Properties shared with the mean 



10 



aa + (3b. 

Proof: fz(z) = J^fx(t)fY(z - t)dt = JT^fx{ru + s)fY(z - ru - s)\r\du for any real 
numbers r 7^ and s. 

Now fz{2a + 2b~ z) = f^fx{ru + s)/y(2a + 2b - z - ru - s)\r\du 

but since fx {2a — x) = fx{x) and /y(26 — x) = fv^x) , this can be rewritten: 

fz{2a + 2b — z) — Jjj fx{2a — ru — s)fY{z + ru + s — 2a)\r\du 

which, with s — 2a, r — —1, is identical to the first expression for fz{z) 

With the same hypotheses, = a, M[Y] = 6 by theorem 4 and the last result shows 

that M[aX + /3Y] = aa + /3b. 

Therefore is a linear operator on the vector space of symmetricaly 
distributed random veiriables. 

Let us stress again that these linearity properties do not generally apply for non- 
symmetrical random variables. However the following is obvious: 

Theorem 6: Let X and Y be two continuous, independent, identically distributed 
random variables. Then the median of X — F is 

Proof: fx-Y{z) — J^f{z + t)f{t)dt hence fx-Y{—z) — J^f{—z + t)f{t)dt which can 
be rewritten Jj^ fit)f{t + z)dt by changing the integration variable t ^ z + t 
One can obviously also use a symmetry argument: P{X <Y) = P{Y < X) 

A barely less simplistic symmetry argument will be used to show the following: 

Theorem 7: Let Xi,{i — 1..2r + 1} be a set of independent random variables having 
the same median mi/2 but otherwise arbitrary distributions and let M be defined as for an 
iid sample. Then A1[M] = mi/2 . 

This, again, is a property shared with the mean and the E.V. in which case it is a 
straightforward consequence of linearity. For the median, however, another argument is 
required. 
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First note that without loss of generahty, mi/2 can be taken equal to 0. 

The hypothesis then reads: P[Xi < 0] = 1/2 Vi and if M is the median of the Xj's, one 
wants to show that P[M < 0] = 1/2 

Now M < if and only if at least r + 1 among the 2r + 1 {Xi < 0} events occur. However 
the number of these is a binomial random variable B{2r + 1, 1/2) and the probability for 
having k events exactly is 22F+TC'2r+i 

Therefore P[M < 0] = ^2f+T Xlfc>r+i ^2r+i- The sum on the right-hand side is easily seen to 
equal half the total and the theorem is proved. 

B. Properties not shared with the mean 

It is well known that quite generally g{E[X]) ^ E\g{X')\ except when g is an affine 
function of its argument. 

For example, if g is convex, Jensen's inequality holds: giEyX^) < E[g{X)] 
In statistical inference this has the consequence that, if an unbiassed estimator can be found 
for some parameter of a probability distribution, no function (except affine) of this parameter 
can be estimated without bias using the same estimator. 

From that point of view, the distribution median is much easier to manipulate. Since it is 
defined by P{X < mi/2) = 1/2 it is immediate that for any continuous strictly monotonous 
g, g{mi/2) is the median of the distribution of g{X). Indeed, X < mi/2 is equivalent to 
either g{X) < g{mij2) or g{X) > g{mi/2). Hence P{g{X) < g{mij2)) = 1/2 in both cases. 
We shall call this the "invariance" property of the median in the sequel. Applying it to 
an affine transformation, we get: 

M[aX + l3] ^aM + (3 
for any two constants a and /3 without any symmetry hypothesis here. 

Of course, this leaves us still far from the linearity of the E.V. operator, but the 
invariance property strengthens the case for an estimation theory wherein 'unbiasedness' 



A Estimation of the distribution median 



12 



is no longer defined w.r.t. the E.V. of the estimator, but w.r.t. its median. The rationale 
for such a definition is neither worth nor better than that for the usual definition. Clearly, 
there is nothing sacred about 'unbiassedness' being defined in terms of expectation value. 

On the other hand, finding the distribution median of a function of several random 
variables is not an easy task except in the trivial and uninteresting gaussian case since it 
always boils down to invert some sort of c.d.f. defined by a convolution. 

IV. USING THE MEDIAN FOR ESTIMATION 
A. Estimation of the distribution median 

Theorem 1 solves the problem of the point estimation of the distribution median (See 
remark 2): The sample median M as defined here both for even and odd sized samples is a 
(median) unbiassed estimator of the distribution median m.1/2 ■ 

By the same token, it shows how median unbiassed estimators can be constructed in 
principle. If a transformation of the parent random variable can be found such that its 

median is equal to the parameter to be estimated, the median of the transformed sample 
becomes a median unbiassed estimator of the said parameter. 

A simple example could be the estimation of the unknown parameter a > of the power 
p.d.f. 

f{x) = ax"'"^ if X e]0, 1] and otherwise. The distribution median is 2"-*^/" and therefore 
the median of the transformed sample Xj —Log{2)/ Log{Xi) is a median unbiassed 
estimator of a. Of course there are more classical solutions and we do not claim that this 
one is the " best" . 

A slightly different formulation is the following: if the parameter of interest can be 
expressed as a monotonous function of the distribution median, then (by the invariance 

property) the same function calculated with the sample median yields a median-unbiassed 
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For example, estimating the parameter of the exponential density fa{x) = ae~"'^6{x)^ 
can be done by observing that mi/2 = and therefore A = ^"if^ with M the sample 

median, is a median-unbiassed estimator of a. Again, there are more classical solutions to 
which this one should be compared. 



B. Interval estimation 

Building a confidence interval for mi/2 is also simple, but can be done in full rigour and 
generality only for certain values of the confidence level which depend on the sample size. 
Indeed, lemma 2 shows that the random interval formed by any pair of sample quantiles 
contains a given distribution quantile with a calculable probability, independent of the parent 
distribution: 

P{Y, <m,< Yi) = P{F{Y,) < q < F{Y,)) = PJldF^^v) 

Jo Jq B{k,l-k)B{l,n-l+l)"' """'^ 

where we have changed variables x ^ u = F{x) and y v = F{y) in ([2]). 

This latter expression is evidently independent of F and in the case of the median 
(g = 1/2), it seems reasonable to take Z = n — A; + 1 so that both ends of the interval are 
determined by statistics playing symmetrical roles. 

Expression ([3]) can be simplified. The demonstration given in^ is both unduly complicated 
and wrong. Here is the simple way: 

P{F{Yk) <q) = P{F{Yk) < q,F{Yi) < q) + P{F{Yk) < q,F{Yi) > q) but since ^ < Yi by 
definition, the first term reduces to P(F(Yj) < g), therefore 

P(n <m^<Yi) = P{F{Yu) <q)- P{F{Yi) < q) (4) 

which is readily calculated in terms of incomplete (3 functions according to lemma 1. 



Remark 6: it is sometimes usefull to bear in mind the connection between probabilities 
such as P{F{Yk) < q) and the binomial distribution: indeed, Y^ < nig if and only if at least k 



C An example of application 



14 



among the Xj are below or equivalentely, at least k among the F(Xi) are below q. But the 
F{Xi) are independent and uniformly distributed betwee and 1, and therefore the number 
of them which take their value below q follows a binomial distribution of parameters n and 
q. Hence P{F(Yk) < q) is the probability that a variate B{n, q) following the said binomial 
distribution takes a value > k. On the other hand, the c.d.f. of a binomial can readily 
be written as an incomplete, normalized (5 integral by deriving the sum of the individual 
probabilities w.r.t. g, simplifying the result and re-integrating w.r.t. q with the appropriate 
end-point condition. This yields 
P{F{Yu) < q) = P{B{n,q) > k) = P^ '^ 

B{k,n-k+i) accordance with a straightforward 

application of lemma 1. 

Thus, in terms of binomial probabilities, (jll) can be rewritten 

P{Yu <m,<Yi) = P{1 > B{n, q) > k) (5) 
and for the median (g = 1/2), 

j=l-l j=n-k 

p{Y, < my, <Yi) = i-r Y.^n = i^r E (6) 

j=k j=k 

where the symmetric choice I = n — k + 1 has been made in the last expression. 

Theorem 8: Let X and Y be two positive, continuous, independent and equally dis- 
tributed random variables. Then the median of Y/ X is equal to 1 

Proof: This theorem is equivalent to theorem 6 restricted to strictly positive random vari- 
ables. Another proof is as follows : 

Let F and / be the common c.d.f. and p.d.f of X and Y. Because the two are independent, 
it follows that: 

A/^ roc roo 

P(— < 1) = P{Y <X)= P{Y < z\X = z)f{z)dz = / F{z)f{z)dz = 1/2 
^ Jo Jo 

Corollary: If the last hypothesis is relaxed to "if Y is distributed as aX with a a positive 
constant" , then the median of Y/ X is equal to a. 

C. An example of application 

A recurrent problem is that of evaluating without 'bias' the ratio of the responses of 
two detectors to the same signal. This arises, for example, in photometry where one would 
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like, to state things simply, assess the ratio of the responses of two telescopes to the light 
received from the same stars. The estimate should, of course, be based on two sets of 
observations performed with both instruments on the same objects. 

The standard way of treating this kind of problem is to assume that both measurements 
(of the same object) have 'true values' Xtme augmented by random errors of zero E.V. The 
ratio a of the 'true values' supposed to be constant is by definition the photometric ratio at 
stake. The question does not possess any unbiassed general answer in the usual sense (see 
appendix). The standard least squares estimator is not even consistent, its bias calculated 
to lowest order involving a term that does not go to zero in the large sample limit.- The 
best that we have found is to use a constrained least square to refit the two sets of values 
whilst imposing a proportionnality relation to the refitted values. The estimator of the 
ratio turns out to be the root of a highly nonlinear equation which can be simplified if one 
assumes that there is also a constant ratio between the standard deviations of the (random) 
errors. In this case, the non-linear equation reduces to a quadratic polynomial, allowing for 
an easy evaluation of bias and variance to lowest order in the error moments. In particular, 
the bias is found to go to zero as -.- 

One might, however, make different assumptions. For example, it could be supposed 
that the ratios of the paired signals from the same objects follow the same probability law 
and one could try to evaluate the median of this law which would be defined as the required 
ratio. According to theorem 1, the sample median is a (median)-unbiassed estimator of this 
parameter. 

This would be the situation if relative errors in each serie were identically distributed, 
in which case the measurements could be written Xi^truei^ + ^i) and aXi^tmei^ + Vi) with 
identically distributed 6i and rji. One would be precisely in the situation suggested here, 
with the median of the ratio probability distribution equal to 

If one further assumes that the distribution of the t]'s is identical to that of the 6's, the 
correction factor to a is equal to 1 according to theorem 7 and its corollary. In any case, 
the value which should really be used in predicting, say, the luminosity of the image of 
a new object through the second telescope given its measured luminosity through the 
first, is the corrected a given here and estimated by the sample median, for the relevant 
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quantity in such a comparison is the ratio of the measured quantities rather than the ratio 
of inaccessible and poorly defined 'true values' 

Note that using the mean of the ratios (instead of their median), would yield 
one of the worst estimators that can be thought of. It is easy to see that it is not even 
consistent, as is always the case when one tries to estimate a (non linear) function value 
by the average of the function values instead of averaging first the (measured) would-be 
arguments and using their averages as arguments of the function. 

One could also imagine that the measurements read Xi^tme + and aXi^tme + Vi with 
the absolute error rji being distributed as a6i Then the corollary of theorem 8 applies and 
all the ratios have equal distribution median, which, through theorem 7, can be estimated 
by their sample median. But in this case one also has a constant ratio between the 
variances which allows to simplify the least-square fit as explained at the beginning of this 
subsection. Moreover, finding a confidence interval in the present case where the various 
ratios have the same median but otherwise different distributions would not be an easy task. 

There are, of course, other choices using, for example, the ratio of the means (once again, 
not the mean of the ratios ! ) But this might arguably be thought of as less effective since it 
does not take into account the correlation between the responses of the two instruments to 
the signals from a given object. The same can be said of other estimators which agglomerate 
the numerators and denominators separately before combining them, as for example the 
geometric mean. 

V. EVALUATING THE DISPERSION OF M 

As already observed, the variance being defined in terms of moments is not taylored for 
an easy evaluation of the dispersion of a value which is characterized by probability condi- 
tions. A more adequate definition must be based on the probability content of the interval 
assumed to represent this dispersion. A possibility would be to take it symmetrical on the 
F{x) scale. If /^^"^ ^^'^(1 - vYdv = 68% or equivalently /^^ ^^^''(1 - vY'dv = 16%, one 
could take F^^{1 — q) — F^^{q) as a measure of the dispersion of M.— 
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Another solution could use the full width at half maximum (FWHM) of /^j written in 
terms of the v variable and convert it back to the x scale. In terms of F (or v) the condition 



v^{l — vY = ^{jY yields immediately the bounds 1/2 ± ^a/I — (1/2)^/'^ and one sees that, 
to lowest order in 1/r, the interval width on the F-scale is \ or ~ \ l^^-n-^ — r on the 
x-scale. As expected, this goes to zero as 1/ ^/r. We shall see in the next paragraph that the 

asymptotic standard deviation is in fact ^ , r which shows that the relation between 

o'asympt and FW HMapprox IS the same as that of a gaussian curve. The probability for M 
to fall in this interval can easily be computed numerically. It decreases with r but remains 
above .761 up to r values of several thousands. 



VI. ASYMPTOTICS 

The derivation of the quantiles asymptotic distributions can be found everywherel! and 
that of the median is but a particular case. We do here something which we haven't found 
in the litterature and which might be of some use to those who need a definition of the 
'centroid' of a distribution in terms of median and mean. We calculate the joint asymptotic 
distribution of the median and the mean for an independent n = 2r + 1-sample 



A. Derivation 

In this paragraph, the distribution median is written /i instead of mi/2 for reasons which 
should become clear during the reading ! 

Let f{x) be the parent distribution p.d.f. We assume that it has an E.V. m, a variance 
cr^. and a unique median /x. 

The joint sample p.d.f. is 11^=1 /(^i); but for the reordered sample {Yj}, this becomes 
^' 11^=1 /(%■) on the domain D = {yi < 1/2 < 1/3.. < i/n} 

The sample mean X = Y is known to have an asymptotically degenerate distribution 

6{x — m) and it is ^/n{X — m) which converges to a gaussian Af{0, cr^). What we have found 

about the dispersion of M = y^+i points to a similar behaviour and we therefore define the 
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two random variables L — y/n{Y — m) and H — y^(l^+i — //) and calculate their joint p.d.f . 
according to: 

g{l,h)= / n\T\dyjf{yj)S{l -=^yj + ^/nm))S{h - ^/nyr+l + Vnii) 

Since the order between the first r and the last r integration variables is irrelevant, this can 
be simplified to: 

j=n 



g{l,h) 



/ TJ^Yl^y^^^yj^^^^ 7=^y^^ ^/nm))S{h - ^/nyr+l + Vnfj.) 

JD' V-) • 



With D' = {l/i, 1/2, ..yr < yr+l < yr+2, l/r+3, ■■Vn} 

The yr+i integration amounts to a simple change of variable: 

j=r j=n 



^) = wv^^^^ + L n n ^%/(%)^(^ - ^ - - E + 

In order to perform the y-integrations, we now use the Fourier representation of Dirac's S: 

J— 1 j—r-\-2 



The integrals between parentheses need only be calculated to order ^ to find the hmit when 

r = ^ ^ oo 

We do this as follows: 

dyf{y)e'^ = / dyfiy)e''^ + / dyf{y)e''^ ^ hit) + hit) 

-oo J —oo 

Because / is integrable and has moments of order 1 and 2, the first two derivatives of the 
function hit) can be calculated by differentiating under the integral sign. This allows us to 
write down the following expansion: 

hit)^\ + i-^ r dyfiy)im-y)-^ T - + o(-) 



or 



hit) ^1 + i-^K - ^af + o(-) 
2 ^/n 2n n 
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where we have used the definitions: k — j'^^dyf{y){m — y) — ^ — j^^dyf{y)y and 
<yl ^ !1^dyf{y){y-mf 

The second |/-integral in g{l, h) can be written: 

poo _ 

/ dyfiyy'"^ ^I[{t)-h{t) 

with 

and an obvious definition for Ur 



Prom these expressions, it is clear that in order to calculate the product + I2){I[ — h) 
to order - we only need I2 to order which is immediate: I2 = +—I2 

We therefore get the following expression for the two y integrals in g: 
and 

m - m = I - _ ,_L^ _ 1/^ _ 

2 y/n sjn n 2n 

The and the I2 terms disappear in the product which, to order ^ reads: 

J-00 JlJ.+ -7= ^ ^ 

The term between parentheses is an asymptotic expansion valid for small However, 
when raised to the power r = missing terms of higher order in ^ give no contributions 
in the n — > 00 limit which is thus valid for any t and reads: 



The first exponential (with the sign of t reversed) is recognized as the Fourier transform of 
a gaussian distribution of variance o"^ — 4«;^ and expectation value AKhf{ii) 
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Therefore, changing t to —t (to perform the inverse transform) and neglecting terms with 
negative powers of n in the argument of the exponential yields after integration over t : 



v/27r(cT2 _ 4^2 

On the other hand, use of Stirling's formula to treat the numerical coefficient 

+ l^ads to J ^fi/j) which checks the normahsation of our asymptotic joint 



distribution. 



B. Result 



Overall, we have found that 

2f(u) (i-i^hf(u.)f 
gil,h) = e" ^(^^-^-^) e-'^^(^) 

which shows that the two random variables H and L — AKf{ii)H are gaussian distributed 
and independnt with zero expectation value and variances and — Ak^ respectively. 

From here one finds V[L\ = as expected, and Cov[H, L] = a possibly new result. 



For a finite but large sample, X and M are approximately normal with E.V.'s m and /i 
and covariance matrix: 

VII. CONCLUSIONS 

There are two major reasons which explain the predominance of the expectation value and 
the mean in the definition and estimation of a central value for a probability distribution. 
The first is the linearity of the barycentric processes, the second is the asymptotic ubiquitness 
of the Laplace-Gauss law for which the expectation value is well defined, the sample mean 
being the 'best' (in various ways) estimator thereof. However, the sample mean is of no help 
for the estimation of the central value of more dispersed distributions like Cauchy's and can 
be seriously flawed by the contamination of a standard sample by outliers. In that case, the 
median which is by construction far less sensitive to outliers can be a better choice. It has 
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several fine qualities but also several drawbacks which have been reviewed to some extent. 
However, the often posed problem of the calculation of the median moments to estimate 
its average value or its uncertainty have been shown to be answers to the wrong questions. 
Moments and quantiles are two different ways of trying to sum up the complexity of a full 
probability distribution with a few numbers. The expectation value and the variance belong 
to the first approach, whilst the median belongs to the second. It is therefore clear that 
the sample median must used for estimating the distribution median without reference to 
expectation values and that interval estimation of this median is most simply done by using 
quantiles or approximating intervals of given probability content for the sample median 
through the use of the dispersion measures which have been suggested above. 



VIII. APPENDIX 



On the impossibility of an unbiassed ratio estimator when both terms have errors. 

The problem of building an unbiassed estimator for a ratio amounts to finding a function / of two variables such that 

for any two independant random variables X and Y 

Since X and Y can be taken as having fixed (arbitrary) values xo and j/o (in which case their densities are Dirac's <5's), eq 
l [71 implies that 

/(x, y) = y/x V(x, y) G M* X R 

However, this would impose E\Y/X] = E\Y]/E[X] hence E[l/X] = 1/E[X] for any random variable X for which both 
sides make sense, and this is well known to be false as can be shown by elementary examples. 



On the other hand, what is required of E[l/X] corresponds to what we have called the invariance property of the median 
(M[l/X] = 1/M[X]) and one could therefore hope to solve the problem using this statistics rather than the mean. 

But by the same reasoning used for the mean, solving M[f {Y, X)] = Af[y]/M[X] leads to f(x,y) = y/x. However, it is 
not true that M[Y/X] = AI[Y]/M[X] in general; for example, the ratio of a 7(2, 1) and a 7(1, 1) independant random variables 
has median 1. + \/2~. = 2.414 instead of the expected fa 2.421 which is the ratio of the medians. 

Therefore the median does not solve the problem of a distribution free, strictly unbiassed estimator of a ratio in the above 
sense. 



* Also at home. 

^ Electronic address: jmlevy@in2p3.fr 



22 



The same term, median, is used for the probabihty distribution central value and for the sample 
central value when central is thought of as 'splitting into equal halves', whereas there are two 
terms, mean (or average) for the sample and expectation value for the distribution when central 
is understood as 'barycentric'. We shall therefore consistently use the expressions 'distribution 
median' and 'sample median' to avoid confusion. 

B{k,l) stands for ^-^(k+^i) ■ Therefore F{Yi:) follows a Pi{k,n — k + 1) probability distribution 
the c.d.f. of which is the corresponding normalized incomplete f3 integral. 

In principle, it is possible to have F{x) = 1/2 for a whole range of values of x. This means that 
the p.d.f. f{x) = in that range, which is usually very unrealistic in physical problems, even 
though analytical examples are easy to build. 

M.G. Kendall and A. Stuart, The advanced theory of Statistics, 2""^ edition. Vol 2, §32.8 

As a counter-example, it is enough to calculate numerical approximations for Xn distribution 

medians. For n = 1,2,3 one finds respectively: .45; 1.38; 2.36 to better than .01; however, 

2 * .45 / 1.38 and .45 + 1.38 / 2.36 

9{x) stands for the Heaviside step function. 

M.G. Kendall and A. Stuart ref. U §32.8-32.9 

The problem is entirely different from the usual least square fit wherein the input variables are 
assumed to be perfectly known. 

The constrained fit is equivalent to a least-square fit wherein weights (inverse variances) are 
calculated as functions of the parameter to be fitted. 

The 68% choice is made by analogy with the probability content of a one-o" half-width interval 
for a gaussian distributed random variable. 

M.G. Kendall and A. Stuart, The advanced theory of Statistics, 2"*^ edition, Vol 1, Ch. 14 



